A human-human train timetable dialogue corpus
نویسندگان
چکیده
This paper describes progress in a development of the humanhuman dialogue corpus. The corpus contains transcribed user’s phone calls to a train timetable information center. The phone calls consist of inquiries regarding their train traveler’s plans. The corpus is based on dialogues’s transcription of user’s inquiries that were previously collected for a train timetable information center. We enriched this transcription by dialogue act tags. The dialogue act tags comprehend abstract semantic annotation. The corpus comprises a recorded speech of both operators and users, orthographic transcription, normalized transcription, normalized transcription with named entities, and dialogue act tags with abstract semantic annotation. A combination of a dialogue act tagset and a abstract semantic annotation is proposed. A technique of dialogue act tagging and abstract semantic annotation is described and used.
منابع مشابه
Czech-Sign Speech Corpus for Semantic Based Machine Translation
This paper describes progress in a development of the human-human dialogue corpus for machine translation of spoken language. We have chosen a semantically annotated corpus of phone calls to a train timetable information center. The phone calls consist of inquiries regarding their train traveler plans. Corpus dialogue act tags incorporate abstract semantic meaning. We have enriched a part of th...
متن کاملUse of Negative Examples in Training the HVS Semantic Model
This paper describes use of negative examples in training the HVS semantic model. We present a novel initialization of the lexical model using negative examples extracted automatically from a semantic corpus as well as description of an algorithm for extraction these examples. We evaluated the use of negative examples on a closed domain human-human train timetable dialogue corpus. We significan...
متن کاملRobust dialogue-state dependent language modeling using leaving-one-out
The use of dialogue-state dependent language models in automatic inquiry systems can improve speech recognition and understanding if a reasonable prediction of the dialogue state is feasible. In this paper, the dialogue state is defined as the set of parameters which are contained in the system prompt. For each dialogue state a separate language model is constructed. In order to obtain robust l...
متن کاملMulti-feature Error Detection in Spoken Dialogue Systems
The present paper evaluates the role selected features and feature combinations play for error detection in spoken dialogue systems. We investigate the relevance of various, readily available features extracted from a corpus of dialogues with a train timetable information system, using RIPPER, a rule-inducing machine learning algorithm. The learning task consists of the identification of commun...
متن کاملCorpus-Based Information Presentation for a Spoken Public Transport Information System
The Alparon project aims to improve Vxos, Openbaar Vervoer Reisinformatie's (OVa) automated speech processing system for public transport information, by using a corpus-based approach. The shortcomings of the current system have been investigated, and a study is made of how dialogues in the OVR domain usually occur between a human operator and a client. While centering our attention on the pres...
متن کامل